智能论文笔记

Entity Anchored ICD Coding

Jay DeYoung , Han-Chin Shing , Luyang Kong , Christopher Winestock , Chaitanya Shivade

分类：机器学习 | 自然语言处理

2022-08-15

医疗编码是一项复杂的任务，需要将超过72,000个ICD代码的子集分配给患者的笔记。对这些任务的现代自然语言处理方法已受到输出空间的输入和大小的长度挑战。我们将模型输入限制在文档中发现的医疗实体周围的一个小窗口中。从这些本地上下文中，我们构建了ICD代码和实体的上下文化表示，并汇总这些表示形式以形成文档级预测。与现有的方法相反，该方法使用使用大小或训练中的代码固定的表示形式，我们通过用本地上下文编码代码描述来表示ICD代码。我们讨论适合在实践中部署编码系统的指标。我们表明，我们的方法优于标准和可部署措施的现有方法，包括在稀有和看不见的代码上的性能。

translated by 谷歌翻译

MAViL: Masked Audio-Video Learners

Po-Yao Huang , Vasu Sharma , Hu Xu , Chaitanya Ryali , Haoqi Fan , Yanghao Li , Shang-Wen Li , Gargi Ghosh , Jitendra Malik , Christoph Feichtenhofer

分类：计算机视觉

2022-12-15

We present Masked Audio-Video Learners (MAViL) to train audio-visual representations. Our approach learns with three complementary forms of self-supervision: (1) reconstruction of masked audio and video input data, (2) intra- and inter-modal contrastive learning with masking, and (3) self-training by reconstructing joint audio-video contextualized features learned from the first two objectives. Pre-training with MAViL not only enables the model to perform well in audio-visual classification and retrieval tasks but also improves representations of each modality in isolation, without using information from the other modality for fine-tuning or inference. Empirically, MAViL sets a new state-of-the-art on AudioSet (53.1 mAP) and VGGSound (67.1% accuracy). For the first time, a self-supervised audio-visual model outperforms ones that use external supervision on these benchmarks. Code will be available soon.

translated by 谷歌翻译

A Computer Vision Method for Estimating Velocity from Jumps

Soumyadip Roy , Chaitanya Roygaga , Nathaniel Blanchard , Aparna Bharati

分类：计算机视觉

2022-12-09

Athletes routinely undergo fitness evaluations to evaluate their training progress. Typically, these evaluations require a trained professional who utilizes specialized equipment like force plates. For the assessment, athletes perform drop and squat jumps, and key variables are measured, e.g. velocity, flight time, and time to stabilization, to name a few. However, amateur athletes may not have access to professionals or equipment that can provide these assessments. Here, we investigate the feasibility of estimating key variables using video recordings. We focus on jump velocity as a starting point because it is highly correlated with other key variables and is important for determining posture and lower-limb capacity. We find that velocity can be estimated with a high degree of precision across a range of athletes, with an average R-value of 0.71 (SD = 0.06).

translated by 谷歌翻译

Spatial Relation Graph and Graph Convolutional Network for Object Goal Navigation

D. A. Sasi Kiran , Kritika Anand , Chaitanya Kharyal , Gulshan Kumar , Nandiraju Gireesh , Snehasis Banerjee , Ruddra dev Roychoudhury , Mohan Sridharan , Brojeshwar Bhowmick , Madhava Krishna

分类：机器人 | 人工智能

2022-08-27

本文描述了对象目标导航任务的框架，该任务要求机器人从随机的启动位置查找并移至目标对象类的最接近实例。该框架使用机器人轨迹的历史记录来学习空间关系图（SRG）和图形卷积网络（GCN）基于基于不同语义标记区域的可能性以及这些区域不同对象类别的发生的可能性。为了在评估过程中定位目标对象实例，机器人使用贝叶斯推理和SRG估计可见区域，并使用学习的GCN嵌入来对可见区域进行排名，并选择接下来的区域。

translated by 谷歌翻译

HTML版本

Multimodal Lecture Presentations Dataset: Understanding Multimodality in Educational Slides

Dong Won Lee , Chaitanya Ahuja , Paul Pu Liang , Sanika Natu , Louis-Philippe Morency

分类：人工智能 | 自然语言处理 | 计算机视觉 | 机器学习

2022-08-17

仔细构建和介绍了一系列包含文本和数字的页面，这些页面是一系列页面，并仔细构建并呈现，以便将知识最佳地转移给学生。先前在多媒体和心理学方面的研究将演讲的有效性归因于其多模式的性质。为了开发AI的一步，以帮助学生学习作为智能教师助理，我们将多模式演讲演示文稿数据集作为大规模的基准测试，以测试机器学习模型在多模式了解教育内容的能力。我们的数据集包含一个对齐的幻灯片和口语，用于180多个小时的视频和9000多个幻灯片，其中10位来自各种主题的讲师（例如，计算机科学，牙科，生物学）。我们介绍了两项研究任务，它们被设计为对AI代理商的垫脚石，这些阶梯可以解释（自动为演讲演示字幕），并说明（综合视觉图形以伴随口语解释）教育内容。我们提供手动注释，以帮助执行这两项研究任务并评估其最新模型。比较基线和人类学生的表现，我们发现当前模型在（1）幻灯片和口语文本之间的较弱的跨模式对齐中挣扎，（2）学习新颖的视觉介质，（3）技术语言和（4）（4）远程序列。为了解决这个问题，我们还引入了Polyvilt，这是一种多模式变压器，经过多种模式的学习损失，比目前的方法更有效。最后，我们阐明了对教育演示的多模式理解的挑战和机遇。

translated by 谷歌翻译

Learning Modular Structures That Generalize Out-of-Distribution

Arjun Ashok , Chaitanya Devaguptapu , Vineeth Balasubramanian

分类：机器学习 | 人工智能

2022-08-07

对于现实世界的机器学习系统，分发（O.O.D.）的概括仍然是一个关键挑战。我们描述了O.O.D.的方法通过培训，概括鼓励模型仅保留网络中的功能，这些功能在多个培训领域都充分利用。我们的方法将两个互补的神经元级正则化剂与网络上的概率可区分二进制掩码相结合，以提取一个模块化子网络，从而实现更好的O.O.D.性能比原始网络。两个基准数据集的初步评估证实了我们方法的承诺。

translated by 谷歌翻译

Ultra-low latency recurrent neural network inference on FPGAs for physics applications with hls4ml

Elham E Khoda , Dylan Rankin , Rafael Teixeira de Lima , Philip Harris , Scott Hauck , Shih-Chieh Hsu , Michael Kagan , Vladimir Loncar , Chaitanya Paikara , Richa Rao

分类：机器学习 | (统计)机器学习

2022-07-01

复发性神经网络已被证明是高能量物理中许多任务的有效体系结构，因此已被广泛采用。然而，由于在现场可编程门阵列（FPGAS）上实现经常性体系结构的困难，它们在低延迟环境中的使用受到了限制。在本文中，我们介绍了HLS4ML框架内两种类型的复发性神经网络层（长期短期内存和封闭式复发单元）的实现。我们证明，我们的实施能够为小型和大型模型生产有效的设计，并且可以定制以满足推理潜伏期和FPGA资源的特定设计要求。我们显示了多个神经网络的性能和合成设计，其中许多是专门针对CERN大型强子对撞机的喷气识别任务的培训。

translated by 谷歌翻译

Automated analysis of fibrous cap in intravascular optical coherence tomography images of coronary arteries

Juhwan Lee , Gabriel T. R. Pereira , Yazan Gharaibeh , Chaitanya Kolluru , Vladislav N. Zimin , Luis A. P. Dallan , Justin N. Kim , Ammar Hoori , Sadeer G. Al-Kindi , Giulio Guagliumi

分类：机器学习 | 计算机视觉

2022-04-21

Thin-cap fibroatheroma (TCFA) and plaque rupture have been recognized as the most frequent risk factor for thrombosis and acute coronary syndrome. Intravascular optical coherence tomography (IVOCT) can identify TCFA and assess cap thickness, which provides an opportunity to assess plaque vulnerability. We developed an automated method that can detect lipidous plaque and assess fibrous cap thickness in IVOCT images. This study analyzed a total of 4,360 IVOCT image frames of 77 lesions among 41 patients. To improve segmentation performance, preprocessing included lumen segmentation, pixel-shifting, and noise filtering on the raw polar (r, theta) IVOCT images. We used the DeepLab-v3 plus deep learning model to classify lipidous plaque pixels. After lipid detection, we automatically detected the outer border of the fibrous cap using a special dynamic programming algorithm and assessed the cap thickness. Our method provided excellent discriminability of lipid plaque with a sensitivity of 85.8% and A-line Dice coefficient of 0.837. By comparing lipid angle measurements between two analysts following editing of our automated software, we found good agreement by Bland-Altman analysis (difference 6.7+/-17 degree; mean 196 degree). Our method accurately detected the fibrous cap from the detected lipid plaque. Automated analysis required a significant modification for only 5.5% frames. Furthermore, our method showed a good agreement of fibrous cap thickness between two analysts with Bland-Altman analysis (4.2+/-14.6 micron; mean 175 micron), indicating little bias between users and good reproducibility of the measurement. We developed a fully automated method for fibrous cap quantification in IVOCT images, resulting in good agreement with determinations by analysts. The method has great potential to enable highly automated, repeatable, and comprehensive evaluations of TCFAs.

translated by 谷歌翻译

Local contrastive loss with pseudo-label based self-training for semi-supervised medical image segmentation

Krishna Chaitanya , Ertunc Erdil , Neerav Karani , Ender Konukoglu

分类：计算机视觉 | 人工智能 | 机器学习 | (统计)机器学习

2021-12-17

监管基于深度学习的方法，产生医学图像分割的准确结果。但是，它们需要大量标记的数据集，并获得它们是一种艰苦的任务，需要临床专业知识。基于半/自我监督的学习方法通过利用未标记的数据以及有限的注释数据来解决此限制。最近的自我监督学习方法使用对比损失来从未标记的图像中学习良好的全球层面表示，并在像想象网那样的流行自然图像数据集上实现高性能。在诸如分段的像素级预测任务中，对于学习良好的本地级别表示以及全局表示来说至关重要，以实现更好的准确性。然而，现有的局部对比损失的方法的影响仍然是学习良好本地表现的限制，因为类似于随机增强和空间接近定义了类似和不同的局部区域;由于半/自我监督设置缺乏大规模专家注释，而不是基于当地地区的语义标签。在本文中，我们提出了局部对比损失，以便通过利用从未标记的图像的未标记图像的伪标签获得的语义标签信息来学习用于分割的良好像素级别特征。特别地，我们定义了建议的损失，以鼓励具有相同伪标签/标签的像素的类似表示，同时与数据集中的不同伪标签/标签的像素的表示。我们通过联合优化标记和未标记的集合和仅限于标记集的分割损失，通过联合优化拟议的对比损失来进行基于伪标签的自培训和培训网络。我们在三个公共心脏和前列腺数据集上进行了评估，并获得高分割性能。

translated by 谷歌翻译

Rotation Equivariant 3D Hand Mesh Generation from a Single RGB Image

Joshua Mitton , Chaitanya Kaul , Roderick Murray-Smith

分类：计算机视觉 | 机器学习

2021-11-25

我们开发了一种从2D RGB图像生成3D手网格的旋转等级模型。这保证了当手的输入图像旋转时，所生成的网格经历相应的旋转。此外，这消除了经常通过无旋转标准天例的方法产生的网格中的不希望的变形。通过构建旋转等级模型，通过考虑问题的对称性，我们减少了对非常大的数据集训练的需求，以实现良好的网格重建。编码器在$ \ mathbb {z} ^ {2} $上定义的图像，并将这些映射到组$ c_ {8} $上定义的潜在函数。我们介绍了一种新颖的向量映射函数来将以$ c_ {8} $定义的函数映射到组$ \ mathrm {so}（2）$上定义的潜在点云空间。此外，我们介绍了一种3D投影函数，它从$ \ mathrm {so}（2）$潜空间中学习3D功能。最后，我们使用$ \ mathrm {so}（3）$ arifariant解码器，以确保旋转标准。我们的旋转设备模型优于现实世界数据集的最先进方法，我们证明它可以准确地捕获在输入手的旋转下产生的网格中的形状和姿势。

translated by 谷歌翻译